Compare Page

Precision

Characteristic Name: Precision
Dimension: Accuracy
Description: Attribute values should be accurate as per linguistics and granularity
Granularity: Element
Implementation Type: Rule-based approach
Characteristic Type: Declarative

Verification Metric:

The number of tasks failed or under performed due to lack of data precision
The number of complaints received due to lack of data precision

GuidelinesExamplesDefinitons

The implementation guidelines are guidelines to follow in regard to the characteristic. The scenarios are examples of the implementation

Guidelines: Scenario:
Ensure the data values are correct to the right level of detail or granularity (1) Price to the penny or weight to the nearest tenth of a gram.
(2) precision of the values of an attribute according to some general-purpose IS-A ontology such as WordNet
Ensure that data is legitimate or valid according to some stable reference source like dictionary/thesaurus/code. (1) Spellings and syntax of a description is correct as per the dictionary/thesaurus/Code (e.g. NYSIIS Code)
(2) Address is consistent with global address book
Ensure that the user interfaces provide the precision required by the task (1) if the domain is infinite (the rational numbers, for example), then no string format of finite length can represent all possible values.
Ensure the data values are lexically, syntactically and semantically correct (1) “Germany is an African country” (semantically wrong); Book.title: ‘De la Mancha Don Quixote’ (syntactically wrong); UK’s Prime Minister: ‘Toni Blair’ (lexically wrong)

Validation Metric:

How mature is the creation and implementation of the DQ rules to maintain data precesion

These are examples of how the characteristic might occur in a database.

Example: Source:
if v = Jack,even if v = John, v is considered syntactically correct, as Jack is an admissible value in the domain of persons’ names C. Batini and M, Scannapieco, “Data Quality: Concepts, Methodologies, and Techniques”, Springer, 2006.

The Definitions are examples of the characteristic that appear in the sources provided.

Definition: Source:
Data values are correct to the right level of detail or granularity, such as price to the penny or weight to the nearest tenth of a gram. ENGLISH, L. P. 2009. Information quality applied: Best practices for improving business information, processes and systems, Wiley Publishing.
Data is correct if it conveys a lexically, syntactically and semantically correct statement – e.g.,the following pieces of information are not correct:“Germany is an African country” (semantically wrong);Book.title: ‘De la Mancha Don Quixote’ (syntactically wrong); UK’s Prime Minister: ‘Toni Blair’ (lexically wrong). KIMBALL, R. & CASERTA, J. 2004. The data warehouse ETL toolkit: practical techniques for extracting. Cleaning, Conforming, and Delivering, Digitized Format, originally published.
The set S should be sufficiently precise to distinguish among elements in the domain that must be distinguished by users. This dimension makes clear why icons and colors are of limited use when domains are large. But problems can and do arise for the other formats as well, because many formats are not one-to-one functions. For example, if the domain is infinite (the rational numbers, for example), then no string format of finite length can represent all possible values. The trick is to provide the precision to meet user needs. LOSHIN, D. 2001. Enterprise knowledge management: The data quality approach, Morgan Kaufmann Pub.
Is the information to the point, void of unnecessary elements? LOSHIN, D. 2006. Monitoring Data quality Performance using Data Quality Metrics. Informatica Corporation.
The degree of precision of the presentation of an attribute’s value should reasonably match the degree of precision of the value being displayed. The user should be able to see any value the attributer may take and also be able to distinguish different values. REDMAN, T. C. 1997. Data quality for the information age, Artech House, Inc.
The granularity or precision of the model or content values of an information object according to some general-purpose IS-A ontology such as WordNet. STVILIA, B., GASSER, L., TWIDALE, M. B. & SMITH, L. C. 2007. A framework for information quality assessment. Journal of the American Society for Information Science and Technology, 58, 1720-1733.

 

Usefulness and relevance

Characteristic Name: Usefulness and relevance
Dimension: Usability and Interpretability
Description: The data is useful and relevant for the task at hand
Granularity: Information object
Implementation Type: Process-based approach
Characteristic Type: Usage

Verification Metric:

The number of tasks failed or under performed due to the lack of usefulness and relevance of data
The number of complaints received due to the lack of usefulness and relevance of data

GuidelinesExamplesDefinitons

The implementation guidelines are guidelines to follow in regard to the characteristic. The scenarios are examples of the implementation

Guidelines: Scenario:
Define the content of the information object based on the user requirements (as required by the task at hand) and also considering all other compliance requirements so that the information is relevant and legitimate (1) Customer invoice should contain information for the customer to understand his liability and for the delivery person to understand the point of delivery and the tax department to verify the applicable tax amount.
Regularly monitor the changes to the internal operational environment ( business process changes etc) and find out what are the new information requirements emerge due to the changes, and provide for them by amending the information structures (1) Time stamp became an important attribute for GRNs (goods receipts notes) when Lean manufacturing started as all raw materials are expected to receive by six hours before production (GRN-record, and the time stamp -attribute)
Regularly monitor the changes in the external environment find out the new information requirements emerge due to such changes and provide for such data needs (1) Competitors' rates have become important to price the existing products during the recession period since the traditional costing method does not give a competitive price.
Regularly check with knowledge workers to find out how their operations/decisions can be performed better with new data available to them and provide for such data in the information system (1) An hourly working progress report is useful in identifying the bottlenecks in production lines and balance the lines
Monitor and measure the user satisfaction about the information provided (1) User satisfaction survey

Validation Metric:

How mature is the process to maintain usefulness and relevance of data

These are examples of how the characteristic might occur in a database.

The Definitions are examples of the characteristic that appear in the sources provided.

Definition: Source:
1) The Characteristic in which the Information is the right kind of Information that adds value to the task at hand, such as to perform a process or make a decision.

2) Knowledge Workers have all the Facts they need to perform their processes or make their decisions.

ENGLISH, L. P. 2009. Information quality applied: Best practices for improving business information, processes and systems, Wiley Publishing.
1) Can the information process be adapted by the information consumer?

2)Can the information be directly applied? Is it useful?

3) Does the information provision correspond to the user’s needs and habits?

EPPLER, M. J. 2006. Managing information quality: increasing the value of information in knowledge-intensive products and processes, Springer.
Relevance of data refers to the extent to which the data meets the needs of users. Information needs may change and is important that reviews take place to ensure data collected is still relevant for decision makers. HIQA 2011. International Review of Data Quality Health Information and Quality Authority (HIQA), Ireland. http://www.hiqa.ie/press-release/2011-04-28-international-review-data-quality.
Relevance is the degree to which statistics meet current and potential users’ needs. It refers to whether all statistics that are needed are produced and the extent to which concepts used (definitions, classifications etc.) LYON, M. 2008. Assessing Data Quality ,
Monetary and Financial Statistics.
Bank of England. http://www.bankofengland.co.uk/
statistics/Documents/ms/articles/art1mar08.pdf.
The data includes all of the types of information important for its use. PRICE, R. J. & SHANKS, G. Empirical refinement of a semiotic information quality framework. System Sciences, 2005. HICSS'05. Proceedings of the 38th Annual Hawaii International Conference on, 2005. IEEE, 216a-216a.
1) Intrinsic: The extent to which the information is new or informative in the context of a particular activity or community.

2) Relational Contextual:The amount of information contained in an information object. At the content level, it is measured as a ratio of the size of the informative content (measured in word terms that are stemmed and stopped) to the overall size of an information object. At the schema number of elements in the object level it is measured as a ratio of the number of unique elements over the total.

3) The extent to which information is applicable in a given activity.

4) The extent to which the model or schema and content of an information object are expressed by conventional, typified terms and forms according to some general-purpose reference source.

STVILIA, B., GASSER, L., TWIDALE, M. B. & SMITH, L. C. 2007. A framework for information quality assessment. Journal of the American Society for Information Science and Technology, 58, 1720-1733.
1) Data are applicable and useful for the task at hand.

2) The quantity or volume of available data is appropriate.

3) Data are of sufficient depth, breath and scope for the task at hand.

WANG, R. Y. & STRONG, D. M. 1996. Beyond accuracy: What data quality means to data consumers. Journal of management information systems, 5-33.